Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add integration tests for neural query #36

Merged
merged 8 commits into from
Oct 31, 2022

Conversation

jmazanec15
Copy link
Member

Description

Adds a series of integration tests for neural query type. Adds shared functionality to base class as well as a utility class.

Issues Resolved

#14

Check List

  • New functionality includes testing.
    • All tests pass
  • New functionality has been documented.
    • New functionality has javadoc added
  • Commits are signed as per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

@jmazanec15 jmazanec15 requested a review from a team October 28, 2022 00:06
@jmazanec15 jmazanec15 added the backport 2.x Label will add auto workflow to backport PR to 2.x branch label Oct 28, 2022
import org.opensearch.common.xcontent.XContentFactory;
import org.opensearch.common.xcontent.XContentHelper;
import org.opensearch.common.xcontent.XContentType;
import org.opensearch.index.query.QueryBuilder;
import org.opensearch.rest.RestStatus;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is not the change which you have done, but can you change the base class of BaseNeuralSearchIT from OpenSearchRestTestCase to OpenSearchSecureRestTestCase. This will allow the secure clusters to be used for testing

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I can update.

* @return number of documents indexed to that index
*/
@SneakyThrows
public int getDocCount(String indexName) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why public? can't we make it protected?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, will update.

modelId.compareAndSet(null, prepareModel());
}

private void maybeInitializeIndex(String indexName) throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

better name for function can be : initializeIndexIfNotExist

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update

modelId.compareAndSet(null, prepareModel());
}

private void maybeInitializeIndex(String indexName) throws IOException {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nit-pick] move this private function after all the public and protected functions.

}

@SneakyThrows
public void testBasicQuery() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: as the knowledge of tests will fade as the time passes, lets add the query and response objects in the java doc so that we know what to expect for all the tests present in this class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each response will be similar. I am going to just provide query.

}

@SneakyThrows
private String prepareModel() {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we move this to BaseNeuralSearchIT class?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

Comment on lines 228 to 234
private void prepareKnnIndex(String indexName, List<KNNFieldConfig> knnFieldConfigs) {
createIndexWithConfiguration(indexName, buildIndexConfiguration(knnFieldConfigs), "");
}

@SneakyThrows
private String buildIndexConfiguration(List<KNNFieldConfig> knnFieldConfigs) {
XContentBuilder xContentBuilder = XContentFactory.jsonBuilder()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move all of these to BaseNeuralSearchIT, as we have search, indexCreation and addKNNDoc all functions there. Plus I believe these can be reused across different ITs.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. Will move. Will leave initializeIndexIfNotExist because this is more specific to this particular test suite.

Comment on lines 121 to 164
public float[] runInference(String modelId, String queryText) {
Response inferenceResponse = makeRequest(
client(),
"POST",
String.format(LOCALE, "/_plugins/_ml/_predict/text_embedding/%s", modelId),
null,
toHttpEntity(String.format(LOCALE, "{\"text_docs\": [\"%s\"],\"target_response\": [\"sentence_embedding\"]}", queryText)),
ImmutableList.of(new BasicHeader(HttpHeaders.USER_AGENT, "Kibana"))
);

Map<String, Object> inferenceResJson = XContentHelper.convertToMap(
XContentFactory.xContent(XContentType.JSON),
EntityUtils.toString(inferenceResponse.getEntity()),
false
);

Object inference_results = inferenceResJson.get("inference_results");
assertTrue(inference_results instanceof List);
List<Object> inferenceResultsAsMap = (List<Object>) inference_results;
assertEquals(1, inferenceResultsAsMap.size());
Map<String, Object> result = (Map<String, Object>) inferenceResultsAsMap.get(0);
List<Object> output = (List<Object>) result.get("output");
assertEquals(1, output.size());
Map<String, Object> map = (Map<String, Object>) output.get(0);
List<Float> data = ((List<Double>) map.get("data")).stream().map(Double::floatValue).collect(Collectors.toList());
return vectorAsListToArray(data);
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the reason for not using the MLCommonsClient here and directly use the inference API?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, not really any reason. I can switch to it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah actually, I think we need to use rest because from these tests we only have access to RestClient. MLCommonsClientAccessor requires a MachineLearningNodeClient which requires a Client which is internal to the cluster.

"framework_type": "sentence_transformers",
"all_config": "{\"architectures\":[\"BertModel\"],\"max_position_embeddings\":512,\"model_type\":\"bert\",\"num_attention_heads\":12,\"num_hidden_layers\":6}"
},
"url": "https://github.com/opensearch-project/ml-commons/blob/2.x/ml-algorithms/src/test/resources/org/opensearch/ml/engine/algorithms/text_embedding/all-MiniLM-L6-v2_torchscript_sentence-transformer.zip?raw=true"
"url": "https://github.com/jmazanec15/k-NN-1/blob/model/traced_small_model.zip?raw=true"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporary: Will update URL to ml-commons once uploaded

Copy link

@ylwu-amzn ylwu-amzn Oct 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool just switched it.

@jmazanec15 jmazanec15 requested a review from navneet1v October 28, 2022 23:13
@jmazanec15 jmazanec15 requested a review from ylwu-amzn October 31, 2022 16:07
@jmazanec15
Copy link
Member Author

Increasing the heap size of a node in the test cluster to 1GB. Default is 512mb which may be too small for ml - https://github.com/opensearch-project/OpenSearch/blob/main/buildSrc/src/main/java/org/opensearch/gradle/testclusters/OpenSearchNode.java#L845


import com.google.common.collect.ImmutableList;

public abstract class BaseNeuralSearchIT extends OpenSearchRestTestCase {
public abstract class BaseNeuralSearchIT extends OpenSearchSecureRestTestCase {

private static final Locale LOCALE = Locale.getDefault();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a issue for this PR. Looks like Locale.getDefault() seem to have issue with Windows shall we use Locale.ROOT? Take it in other PR

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let use Locale.ROOT.

I think you have not rebased the code, this statement has been updated.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will rebase

* }
*/
@SneakyThrows
public void testRescoreQuery() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some function to rescore like weight?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didnt think it was necessary given that weight will be outside of the neural query

Comment on lines 155 to 156
/**
* Tests bool should query:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about bool query one with neural and one with BM25?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will add

Adds a series of integration tests for neural query type. Adds shared
functionality to base class as well as a utility class.

Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Signed-off-by: John Mazanec <[email protected]>
Copy link
Member

@vamshin vamshin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@jmazanec15 jmazanec15 merged commit e30285b into opensearch-project:main Oct 31, 2022
opensearch-trigger-bot bot pushed a commit that referenced this pull request Oct 31, 2022
Adds a series of integration tests for neural query type. Adds shared
functionality to base class as well as a utility class. Increase test
cluster heap to 1 GB.

Signed-off-by: John Mazanec <[email protected]>
(cherry picked from commit e30285b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport 2.x Label will add auto workflow to backport PR to 2.x branch Infrastructure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants